Covid Data Visualization

The data for creating the visualization is taken from https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide, and it includes daily cases and deaths information for Dec 31, 2019, to Dec 14, 2020 time period, for each country. I created the visualisation using the Plotly Python library.

In [3]:
import pandas as pd

df = pd.read_excel('COVID-19-geographic-disbtribution-worldwide.xlsx')
df1 = df.sort_values(by=['dateRep'])
df1['date'] = df1['dateRep'].dt.strftime('%d/%m/%Y')
df1['year_month'] = df1['dateRep'].dt.strftime('%Y-%m')
df1.drop(df1[(df1['cases'] < 0) | (df1['deaths'] < 0)].index, inplace=True)
df1.dropna(subset=['countryterritoryCode'], inplace=True)
df1['cumulative_cases'] = df1.groupby(['countryterritoryCode'])['cases'].cumsum()
df1['cumulative_deaths'] = df1.groupby(['countryterritoryCode'])['deaths'].cumsum()
df1 = df1.rename(columns={"countryterritoryCode": "country_code", "countriesAndTerritories": "country", "popData2019": "population2019", "continentExp": "continent"})
df.head()
Out[3]:
dateRep day month year cases deaths countriesAndTerritories geoId countryterritoryCode popData2019 continentExp Cumulative_number_for_14_days_of_COVID-19_cases_per_100000
0 2019-12-31 31 12 2019 0 0 Afghanistan AF AFG 38041757.0 Asia NaN
1 2019-12-31 31 12 2019 0 0 Algeria DZ DZA 43053054.0 Africa NaN
2 2019-12-31 31 12 2019 0 0 Armenia AM ARM 2957728.0 Europe NaN
3 2019-12-31 31 12 2019 0 0 Australia AU AUS 25203200.0 Oceania NaN
4 2019-12-31 31 12 2019 0 0 Austria AT AUT 8858775.0 Europe NaN

Covid Spread Animation

The animation shows the spread of the virus across the countries. Till Mar 2020, most of the cases were announced in China. There were also some cases announced in Thailand, Mexico, Japan, South Korea etc. End of February, the daily new cases started increasing dramatically in the US and Italy. By the end of March, it spread all over the world. Starting from March, the daily new cases in China started to decrease, whereas in the USA, Italy, and Spain things getting worse. You can see that by the colours of the bubbles in the graph. In May, the daily new cases announced in the USA continued to stay high, compared to the rest of the world. Starting from June, the daily new cases started to increase dramatically in Brazil, and then, starting July, also in India. By the end of the year, the most affected countries were the USA, Brazil, and India. Whereas in China, the daily new cases were low, comparing to other countries.

In [4]:
import plotly.express as px
df1['size'] = df1['cumulative_cases'].pow(0.2)
fig = px.scatter_geo(df1, locations="country_code", color="cumulative_cases",
                     hover_name="country", size="size",
                     animation_frame="date", color_continuous_scale='jet',
                     projection="equirectangular", hover_data={'cases':True, 'size':False,'country_code':False},
                     title='COVID-19 Spread (31/Dec/2019 - 14/Dec/2020)')
fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 2
fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 2
fig.show()

Top 20 most affected countries

The Bar plots show the top 20 affected countries based on total cases, the share of cases in the population, the number of deaths, and the share of deaths in total cases for each country. Even though most cases happened in the USA, the highest rate of case fatality is in Yemen (29%), followed by Mexico (9.12%).

In [9]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(rows=1, cols=7,
                    subplot_titles=("Total Cases", "", "Cases/Population(%)", "", "Total Deaths", "", "Case‑Fatality(%)"))

fig.add_trace(go.Bar(x=df_c['cases'], y=df_c['country'], orientation='h', name="Total Cases", hovertext=df_c['continent']),
              1, 1)

fig.add_trace(go.Bar(x=df_cp['cases%'], y=df_cp['country'], orientation='h', name="Cases/Population(%)", hovertext=df_cp['continent']),
              1, 3)

fig.add_trace(go.Bar(x=df_d['deaths'], y=df_d['country'], orientation='h', name="Total Deaths", hovertext=df_d['continent']),
              1, 5)

fig.add_trace(go.Bar(x=df_dp['deaths%'], y=df_dp['country'], orientation='h', name="Case‑Fatality(%)", hovertext=df_dp['continent']),
              1, 7)

#fig.update_yaxes(showticklabels=False)
fig.update_layout(title_text="Top 20 most Affected Countries", showlegend=False)
fig.show()

Top 20 most affected countries, grouped by continent

The colours show the continents, which highlights that the big share of countries where the case fatality rate is in the top 20 are from Africa, whereas the big share of cases/population(%) is in Europe. That means that even though in Europe so many people get infected, most of them recovered. On the contrary, in Africa, so many covid cases ended by death.

In [10]:
fig = make_subplots(rows=1, cols=7,
                    subplot_titles=("Total Cases", "", "Cases/Population(%)", "", "Total Deaths", "", "Case‑Fatality(%)"))

#fig=go.Figure()
for t in df_c['continent'].unique():
    dfp = df_c[df_c['continent']==t]
    fig.add_traces(go.Bar(x=dfp['cases'], y = dfp['country'], orientation='h', name=t,
                         marker_color=colors[t], showlegend=False), 1, 1)

for t in df_cp['continent'].unique():
    dfp = df_cp[df_cp['continent']==t]
    fig.add_traces(go.Bar(x=dfp['cases%'], y = dfp['country'], orientation='h', name=t,
                         marker_color=colors[t], showlegend=False), 1, 3)

for t in df_d['continent'].unique():
    dfp = df_d[df_d['continent']==t]
    fig.add_traces(go.Bar(x=dfp['deaths'], y = dfp['country'], orientation='h', name=t,
                         marker_color=colors[t], showlegend=False), 1, 5)


for t in df_dp['continent'].unique():
    dfp = df_dp[df_dp['continent']==t]
    fig.add_traces(go.Bar(x=dfp['deaths%'], y = dfp['country'], orientation='h', name=t,
                         marker_color=colors[t]), 1, 7)



fig.update_layout(title_text="Top 20 most Affected Countries, sorted by continent")    
fig.show() 
In [11]:
df4 = df1.groupby(['year_month', 'country_code', 'country', 'continent', 'population2019']).sum().reset_index()
df4 = df4[['year_month', 'country_code', 'country', 'continent', 'population2019', 'cases', 'deaths']]
df4['cases%'] = round((df4['cases']/df4['population2019'])*100, 2)
df4['deaths%'] = round((df4['deaths']/df4['cases'])*100, 2)
In [12]:
fig = px.bar(df4, x="year_month", y="cases", color='continent', barmode='group', hover_data=['continent', 'country'], title='Monthly New Cases by Continent')
fig.show()